Take-home Exercise 2

Author

QIU RUILIU

Published

December 6, 2023

Modified

December 17, 2023

Setting the Scene

The inquiry focuses on the key motivators prompting city residents to rise early for their daily commutes from home to work, and the consequences of discontinuing public bus services along specific routes. These issues represent significant challenges for transport operators and urban planners.

Traditionally, understanding these dynamics involved conducting extensive commuter surveys. These surveys, however, are expensive, time-intensive, and laborious. Moreover, the data collected often requires extensive processing and analysis, leading to reports that are frequently outdated by the time they are completed.

With the digitalization of urban infrastructure, including public buses, mass rapid transit systems, public utilities, and roads, new opportunities for data collection arise. The integration of pervasive computing technologies like GPS in vehicles and SMART cards among public transport users allows for detailed tracking of movement patterns across time and space.

Despite this, the rapid accumulation of geospatial data has overwhelmed planners’ capacity to effectively analyze and convert it into valuable insights. This inefficiency negatively impacts the return on investment in data collection and management.

Motivation and Objective

The purpose of this take-home project is twofold. First, it addresses the gap in applied research demonstrating the integration, analysis, and modeling of the increasingly available open data for effective policy-making. Despite the abundance of such data, there is a noticeable absence of practical studies showcasing its potential use in policy decisions.

Second, the project aims to fill the void in practical research illustrating the application of geospatial data science and analysis (GDSA) in decision-making processes.

Therefore, the assignment involves conducting a case study to showcase the value of GDSA. This will involve synthesizing publicly accessible data from various sources to construct spatial interaction models. These models will be used to identify and analyze factors influencing the urban mobility patterns of public bus transit.

1.Getting Started

The purpose of pacman is to streamline package management in R. One of its key functions, p_load, is used to install (if not already installed) and then load the specified R packages.

Inside the p_load function, several packages are listed:

  • tmap: Used for creating thematic maps, which are useful in geospatial analysis.

  • sf: Used for handling and analyzing spatial data.

  • dplyr: Part of the tidyverse, this package provides functions for data manipulation and transformation.

  • DT: Used for interactive display of data tables in R.

  • stplanr: A package tailored for sustainable transport planning, including functions for analyzing spatial lines, networks, etc.

  • performance: Useful for assessing and checking the performance of statistical models.

  • mapview: Facilitates interactive viewing of spatial data in R.

  • ggpubr: Provides functions to create “ggplot2”-based publication ready plots.

  • tidyverse: A collection of R packages designed for data science, including data manipulation, exploration, and visualization.

pacman::p_load(tmap, sf, dplyr, DT, sp,
               stplanr, performance, mapview,
               ggpubr, tidyverse, httr,
               units, reshape2)

2.Data Importing

2.1Geospatial Data Importing

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
mpsz <- st_read(dsn = "data/geospatial",
                   layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
mpsz
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...
mpsz <- write_rds(mpsz, "data/rds/mpsz.rds")

2.2Aspatial Data Importing

odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")
glimpse(odbus)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 

2.3Extracting the Study Data

weekday6_9 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
datatable(weekday6_9)

We will save the output in rds format for future used.

write_rds(weekday6_9, "data/rds/weekday6_9.rds")

The code chunk below will be used to import the save weekday6_9.rds into R environment.

weekday6_9 <- read_rds("data/rds/weekday6_9.rds")
weekday17_20 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 17 &
           TIME_PER_HOUR <= 20) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
datatable(weekday17_20)
write_rds(weekday17_20, "data/rds/weekday17_20.rds")
weekday17_20 <- read_rds("data/rds/weekday17_20.rds")
weekend11_14 <- odbus %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 11 &
           TIME_PER_HOUR <= 14) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
datatable(weekend11_14)
write_rds(weekend11_14, "data/rds/weekend11_14.rds")
weekend11_14 <- read_rds("data/rds/weekend11_14.rds")
weekend16_19 <- odbus %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 16 &
           TIME_PER_HOUR <= 19) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
datatable(weekend16_19)
write_rds(weekend16_19, "data/rds/weekend16_19.rds")
weekend16_19 <- read_rds("data/rds/weekend16_19.rds")

4.Geospatial Data Wrangling

4.1Combining busstop and mpsz

busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C)
mapview(busstop_mpsz)

4.2Creating Hexagon Layer

Create hexagonal grid based on the extent of mpsz dataset

hex_grid <- st_make_grid(mpsz, cellsize = 2 * 375 / sqrt(3), square = FALSE)

Convert the grid to an sf object and add an ID for each hexagon

hex_grid_sf <- st_sf(geometry = hex_grid) %>%
  mutate(hex_id = 1:length(hex_grid))

Count the number of bus stops in each hexagon

busstop_counts <- st_intersects(hex_grid_sf, busstop_mpsz, sparse = FALSE) %>% 
  apply(1, function(x) sum(x, na.rm = TRUE))

Add bus stop count to hex grid sf object

hex_grid_sf$bus_stop_count <- busstop_counts

Filter out hexagons with no bus stops

hex_grid_sf <- hex_grid_sf %>%
  filter(bus_stop_count > 0)
tmap_options(check.and.fix = TRUE)
map_hexagon <- tm_shape(hex_grid_sf) +
  tm_polygons(
    col = "bus_stop_count",
    palette = "Purples",
    style = "cont",
    title = "Number of Bus Stops",
    id = "hex_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.vars = c("Number of Bus Stops" = "bus_stop_count"),
    popup.format = list(bus_stop_count = list(format = "f", digits = 0))
  ) +
  tm_shape(mpsz) + 
  tm_borders(col = "grey75", lwd = 0.7) +
  tm_layout(
    main.title = "Bus Stop Distribution in Singapore",
    main.title.size = 1.5,
    legend.title.size = 1,
    legend.text.size = 0.8,
    legend.position = c("left", "bottom"),
    frame = FALSE,
    inner.margins = c(0.05, 0.05, 0.05, 0.05)
  ) +
  tm_credits("Data Source: LTA DataMall, Data.gov.sg", position = c("RIGHT", "BOTTOM"), size = 0.8) +
  tm_view(view.legend.position = c("left", "bottom"))

map_hexagon

Next, we are going to append the planning subzone code from busstop_mpsz data frame onto odbus6_9 data frame.

busstop_hex <- st_intersection(busstop, hex_grid_sf) %>%
  select(BUS_STOP_N, hex_id) %>%
  st_drop_geometry()
busstop_hex <- na.omit(busstop_hex)
head(busstop_hex)
     BUS_STOP_N hex_id
3269      25059    393
254       26379    444
2570      25751    488
4203      26389    490
2403      26369    491
2897      25761    535
write_rds(busstop_hex, "data/rds/busstop_hex.rds")  
rm(hex_grid, map_hexagon, odbus, busstop_counts)

5.Preparing Commute Flow Data

5.1Weekday Morning Peak

weekday_morning_od <- left_join(weekday6_9 , busstop_hex,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_HEX = hex_id,
         DESTIN_BS = DESTINATION_PT_CODE)

Before continue, it is a good practice for us to check for duplicating records.

duplicate <- weekday_morning_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

If duplicated records are found, the code chunk below will be used to retain the unique records.

weekday_morning_od <- unique(weekday_morning_od)
weekday_morning_od <- left_join(weekday_morning_od , busstop_hex,
            by = c("DESTIN_BS" = "BUS_STOP_N"))
duplicate <- weekday_morning_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekday_morning_od <- unique(weekday_morning_od)
weekday_morning_od <- weekday_morning_od %>%
  rename(DESTIN_HEX = hex_id) %>%
  drop_na() %>%
  group_by(ORIGIN_HEX, DESTIN_HEX) %>%
  summarise(WEEKDAY_MORNING_PEAK = sum(TRIPS))

It is time to save the output into an rds file format.

write_rds(weekday_morning_od, "data/rds/weekday_morning_od.rds")
weekday_morning_od <- read_rds("data/rds/weekday_morning_od.rds")

5.2Weekday Afternoon Peak

weekday_afternoon_od <- left_join(weekday17_20 , busstop_hex,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_HEX = hex_id,
         DESTIN_BS = DESTINATION_PT_CODE)
duplicate <- weekday_afternoon_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekday_afternoon_od <- unique(weekday_afternoon_od)
weekday_afternoon_od <- left_join(weekday_afternoon_od , busstop_hex,
            by = c("DESTIN_BS" = "BUS_STOP_N"))
duplicate <- weekday_afternoon_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekday_afternoon_od <- unique(weekday_afternoon_od)
weekday_afternoon_od <- weekday_afternoon_od %>%
  rename(DESTIN_HEX = hex_id) %>%
  drop_na() %>%
  group_by(ORIGIN_HEX, DESTIN_HEX) %>%
  summarise(WEEKDAY_AFTERNOON_PEAK = sum(TRIPS))
write_rds(weekday_afternoon_od, "data/rds/weekday_afternoon_od.rds")
weekday_afternoon_od <- read_rds("data/rds/weekday_afternoon_od.rds")

5.3Weekend Morning Peak

weekend_morning_od <- left_join(weekend11_14 , busstop_hex,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_HEX = hex_id,
         DESTIN_BS = DESTINATION_PT_CODE)
duplicate <- weekend_morning_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekend_morning_od <- unique(weekend_morning_od)
weekend_morning_od <- left_join(weekend_morning_od , busstop_hex,
            by = c("DESTIN_BS" = "BUS_STOP_N"))
duplicate <- weekend_morning_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekend_morning_od <- unique(weekend_morning_od)
weekend_morning_od <- weekend_morning_od %>%
  rename(DESTIN_HEX = hex_id) %>%
  drop_na() %>%
  group_by(ORIGIN_HEX, DESTIN_HEX) %>%
  summarise(WEEKEND_MORNING_PEAK = sum(TRIPS))
write_rds(weekend_morning_od, "data/rds/weekend_morning_od.rds")
weekend_morning_od <- read_rds("data/rds/weekend_morning_od.rds")

5.4Weekend Evening Peak

weekend_evening_od <- left_join(weekend16_19 , busstop_hex,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_HEX = hex_id,
         DESTIN_BS = DESTINATION_PT_CODE)
duplicate <- weekend_evening_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekend_evening_od <- unique(weekend_evening_od)
weekend_evening_od <- left_join(weekend_evening_od , busstop_hex,
            by = c("DESTIN_BS" = "BUS_STOP_N"))
duplicate <- weekend_evening_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekend_evening_od <- unique(weekend_evening_od)
weekend_evening_od <- weekend_evening_od %>%
  rename(DESTIN_HEX = hex_id) %>%
  drop_na() %>%
  group_by(ORIGIN_HEX, DESTIN_HEX) %>%
  summarise(WEEKEND_EVENING_PEAK = sum(TRIPS))
write_rds(weekend_evening_od, "data/rds/weekend_evening_od.rds")
weekend_evening_od <- read_rds("data/rds/weekend_evening_od.rds")
rm(weekday6_9, weekday17_20, weekend11_14, weekend16_19)

6.Visualising Spatial Interaction

6.1Removing Intra-zonal Flows

weekday_morning_od1 <- weekday_morning_od[weekday_morning_od$ORIGIN_HEX!=weekday_morning_od$DESTIN_HEX,]
weekday_afternoon_od1 <- weekday_afternoon_od[weekday_afternoon_od$ORIGIN_HEX!=weekday_afternoon_od$DESTIN_HEX,]
weekend_morning_od1 <- weekend_morning_od[weekend_morning_od$ORIGIN_HEX!=weekend_morning_od$DESTIN_HEX,]
weekend_evening_od1 <- weekend_evening_od[weekend_evening_od$ORIGIN_HEX!=weekend_evening_od$DESTIN_HEX,]

6.2Creating the Desire Lines

weekday_morning_flowLine <- od2line(flow = weekday_morning_od1, 
                    zones = hex_grid_sf,
                    zone_code = "hex_id")
weekday_afternoon_flowLine <- od2line(flow = weekday_afternoon_od1, 
                    zones = hex_grid_sf,
                    zone_code = "hex_id")
weekend_morning_flowLine <- od2line(flow = weekend_morning_od1, 
                    zones = hex_grid_sf,
                    zone_code = "hex_id")
weekend_evening_flowLine <- od2line(flow = weekend_evening_od1, 
                    zones = hex_grid_sf,
                    zone_code = "hex_id")

6.3Visualising the Desire Lines

tm_shape(hex_grid_sf) +
  tm_polygons(
    border.col = "grey50", 
    border.alpha = 0.6, 
    alpha = 0.1
  ) +
  tm_shape(weekday_morning_flowLine %>% 
             filter(WEEKDAY_MORNING_PEAK >= 5000)) +
  tm_lines(
    lwd = "WEEKDAY_MORNING_PEAK",
    style = "quantile",
    scale = c(0.1, 1, 3, 5, 7, 10),
    n = 6,
    alpha = 0.5,
    palette = "Blues"
  ) +
  tm_shape(mpsz) +
  tm_borders(
    col = "darkblue", 
    alpha = 0.1,
    lwd = 1.5
  ) +
  tm_layout(
    main.title = "Weekday Morning Commute Flows in Singapore",
    main.title.position = "center",
    main.title.size = 1.0,
    legend.title.size = 0.8,
    legend.text.size = 0.7,
    legend.position = c("left", "bottom"),
    frame = FALSE,
    inner.margins = c(0.05, 0.05, 0.05, 0.05)
  ) +
  tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)

tm_shape(hex_grid_sf) +
  tm_polygons(
    border.col = "grey50", 
    border.alpha = 0.6, 
    alpha = 0.1
  ) +
  tm_shape(weekday_afternoon_flowLine %>% 
             filter(WEEKDAY_AFTERNOON_PEAK >= 5000)) +
  tm_lines(
    lwd = "WEEKDAY_AFTERNOON_PEAK",
    style = "quantile",
    scale = c(0.1, 1, 3, 5, 7, 10),
    n = 6,
    alpha = 0.5,
    palette = "Blues"
  ) +
  tm_shape(mpsz) +
  tm_borders(
    col = "darkblue", 
    alpha = 0.1,
    lwd = 1.5
  ) +
  tm_layout(
    main.title = "Weekday Afternoon Commute Flows in Singapore",
    main.title.position = "center",
    main.title.size = 1.0,
    legend.title.size = 0.8,
    legend.text.size = 0.7,
    legend.position = c("left", "bottom"),
    frame = FALSE,
    inner.margins = c(0.05, 0.05, 0.05, 0.05)
  ) +
  tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)

tm_shape(hex_grid_sf) +
  tm_polygons(
    border.col = "grey50", 
    border.alpha = 0.6, 
    alpha = 0.1
  ) +
  tm_shape(weekend_morning_flowLine %>% 
             filter(WEEKEND_MORNING_PEAK >= 3000)) +
  tm_lines(
    lwd = "WEEKEND_MORNING_PEAK",
    style = "quantile",
    scale = c(0.1, 1, 3, 5, 7, 13, 15),
    n = 7,
    alpha = 0.5,
    palette = "Blues"
  ) +
  tm_shape(mpsz) +
  tm_borders(
    col = "darkblue", 
    alpha = 0.1,
    lwd = 1.5
  ) +
  tm_layout(
    main.title = "Weekend Morning Commute Flows in Singapore",
    main.title.position = "center",
    main.title.size = 1.0,
    legend.title.size = 0.8,
    legend.text.size = 0.7,
    legend.position = c("left", "bottom"),
    frame = FALSE,
    inner.margins = c(0.05, 0.05, 0.05, 0.05)
  ) +
  tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)

tm_shape(hex_grid_sf) +
  tm_polygons(
    border.col = "grey50", 
    border.alpha = 0.6, 
    alpha = 0.1
  ) +
  tm_shape(weekend_evening_flowLine %>% 
             filter(WEEKEND_EVENING_PEAK >= 3000)) +
  tm_lines(
    lwd = "WEEKEND_EVENING_PEAK",
    style = "quantile",
    scale = c(0.1, 1, 3, 5, 7, 10),
    n = 6,
    alpha = 0.5,
    palette = "Blues"
  ) +
  tm_shape(mpsz) +
  tm_borders(
    col = "darkblue", 
    alpha = 0.1,
    lwd = 1.5
  ) +
  tm_layout(
    main.title = "Weekend Evening Commute Flows in Singapore",
    main.title.position = "center",
    main.title.size = 1.0,
    legend.title.size = 0.8,
    legend.text.size = 0.7,
    legend.position = c("left", "bottom"),
    frame = FALSE,
    inner.margins = c(0.05, 0.05, 0.05, 0.05)
  ) +
  tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)

rm(duplicate, weekday_morning_flowLine, weekday_morning_od, weekday_morning_od1,
   weekend_morning_flowLine, weekend_morning_od, weekend_morning_od1,
   weekend_evening_flowLine, weekend_evening_od, weekend_evening_od1)

7.Attractiveness Factors

7.1Data Integration

entertn <- st_read(dsn = "data/geospatial",
                      layer = "entertn")
Reading layer `entertn' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 114 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 10809.34 ymin: 26528.63 xmax: 41600.62 ymax: 46375.77
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$entertn_count <- lengths(st_intersects(hex_grid_sf, entertn))
FB <- st_read(dsn = "data/geospatial",
                   layer = "F&B")
Reading layer `F&B' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1919 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25343.27 xmax: 45462.43 ymax: 48796.21
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$FB_count <- lengths(st_intersects(hex_grid_sf, FB))
lere <- st_read(dsn = "data/geospatial",
                   layer = "Liesure&Recreation")
Reading layer `Liesure&Recreation' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1217 features and 30 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25134.28 xmax: 48439.77 ymax: 50078.88
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$lere_count <- lengths(st_intersects(hex_grid_sf, lere))
retail <- st_read(dsn = "data/geospatial",
                  layer = "Retails")
Reading layer `Retails' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 37635 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4737.982 ymin: 25171.88 xmax: 48265.04 ymax: 50135.28
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$retail_count <- lengths(st_intersects(hex_grid_sf, retail))
trainexits <- st_read(dsn = "data/geospatial",
                      layer = "Train_Station_Exit_Layer")
Reading layer `Train_Station_Exit_Layer' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 565 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6134.086 ymin: 27499.7 xmax: 45356.36 ymax: 47865.92
Projected CRS: SVY21
trainexits <- st_transform(trainexits, st_crs(hex_grid_sf))
hex_grid_sf$trainexits_count <- lengths(st_intersects(hex_grid_sf, trainexits))
hdb <- read_csv("data/aspatial/hdb.csv")
residential <- hdb %>%
  filter(residential == "Y") %>%
  select(lat, lng, total_dwelling_units) %>%
  na.omit()
residential <- st_as_sf(residential, 
                           coords = c("lng", "lat"), 
                           crs = 4326) %>%
  st_transform(crs = st_crs(hex_grid_sf))
intersections <- st_intersects(hex_grid_sf, residential, sparse = TRUE)
hex_grid_sf$residential_count <- mapply(function(index, residential) {
  sum(residential$total_dwelling_units[index], na.rm = TRUE)
}, intersections, MoreArgs = list(residential = residential))
rm(intersections)

8.Propulsive Factors

8.1Data Integration

business <- st_read(dsn = "data/geospatial",
                      layer = "Business")
Reading layer `Business' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 6550 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3669.148 ymin: 25408.41 xmax: 47034.83 ymax: 50148.54
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$business_count <- lengths(st_intersects(hex_grid_sf, business))
url<-"https://www.onemap.gov.sg/api/common/elastic/search"

csv<-read_csv("data/aspatial/Generalinformationofschools.csv")
postcodes<-csv$`postal_code`

found<-data.frame()
not_found<-data.frame()

for(postcode in postcodes){
  query<-list('searchVal'=postcode,'returnGeom'='Y','getAddrDetails'='Y','pageNum'='1')
  res<- GET(url,query=query)
  
  if((content(res)$found)!=0){
    found<-rbind(found,data.frame(content(res))[4:13])
  } else{
    not_found = data.frame(postcode)
  }
}
merged = merge(csv, found, by.x = 'postal_code', by.y = 'results.POSTAL', all = TRUE)
write.csv(merged, file = "data/aspatial/schools.csv")
write.csv(not_found, file = "data/aspatial/not_found.csv")
schools <- read_csv("data/aspatial/schools.csv") %>%
  rename(latitude = "results.LATITUDE",
         longitude = "results.LONGITUDE")%>%
  select(postal_code, school_name, latitude, longitude)
schools <- schools %>%
  filter(!is.na(longitude) & !is.na(latitude)) %>%
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
  st_transform(crs = 3414)
hex_grid_sf$school_count <- lengths(st_intersects(hex_grid_sf, schools))
rm(csv, found, merged, not_found, query, res,
   postcode, postcodes, url)
finserv <- st_read(dsn = "data/geospatial",
                      layer = "FinServ")
Reading layer `FinServ' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3320 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4881.527 ymin: 25171.88 xmax: 46526.16 ymax: 49338.02
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$finserv_count <- lengths(st_intersects(hex_grid_sf, finserv))

8.2Final Check

datatable(hex_grid_sf)

9.Computing Distance Matrix

9.1Converting from sf data.table to SpatialPolygonsDataFrame

hex_grid_sp <- as(hex_grid_sf, "Spatial")
hex_grid_sp
class       : SpatialPolygonsDataFrame 
features    : 1853 
extent      : 3966.576, 48566.88, 26373.72, 50123.72  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 11
names       : hex_id, bus_stop_count, entertn_count, FB_count, lere_count, retail_count, trainexits_count, residential_count, business_count, school_count, finserv_count 
min values  :    393,              1,             0,        0,          0,            0,                0,                 0,              0,            0,             0 
max values  :   9988,             13,             8,       71,         26,          986,                9,              3531,             60,            4,            91 

9.2 Computing the distance matrix

dist <- spDists(hex_grid_sp, 
                longlat = FALSE)
head(dist, n=c(10, 10))
           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
 [1,]    0.0000 2633.9134  866.0254 2291.2878 3031.0889  750.0000 1984.3135
 [2,] 2633.9134    0.0000 1887.4586  433.0127  433.0127 2291.2878  866.0254
 [3,]  866.0254 1887.4586    0.0000 1500.0000 2250.0000  433.0127 1145.6439
 [4,] 2291.2878  433.0127 1500.0000    0.0000  750.0000 1887.4586  433.0127
 [5,] 3031.0889  433.0127 2250.0000  750.0000    0.0000 2633.9134 1145.6439
 [6,]  750.0000 2291.2878  433.0127 1887.4586 2633.9134    0.0000 1500.0000
 [7,] 1984.3135  866.0254 1145.6439  433.0127 1145.6439 1500.0000    0.0000
 [8,] 4175.8233 1561.2495 3381.9373 1887.4586 1145.6439 3750.0000 2250.0000
 [9,] 1732.0508 1299.0381  866.0254  866.0254 1561.2495 1145.6439  433.0127
[10,] 2410.9127  750.0000 1561.2495  433.0127  866.0254 1887.4586  433.0127
          [,8]      [,9]     [,10]
 [1,] 4175.823 1732.0508 2410.9127
 [2,] 1561.249 1299.0381  750.0000
 [3,] 3381.937  866.0254 1561.2495
 [4,] 1887.459  866.0254  433.0127
 [5,] 1145.644 1561.2495  866.0254
 [6,] 3750.000 1145.6439 1887.4586
 [7,] 2250.000  433.0127  433.0127
 [8,]    0.000 2633.9134 1887.4586
 [9,] 2633.913    0.0000  750.0000
[10,] 1887.459  750.0000    0.0000

9.3Labeling Column and Row Headers of a Distance Matrix

hex_names <- hex_grid_sf$hex_id
colnames(dist) <- paste0(hex_names)
rownames(dist) <- paste0(hex_names)

9.4Pivoting Distance Value by HEX_ID

distPair <- melt(dist) %>%
  rename(dist = value)
head(distPair, 10)
   Var1 Var2      dist
1   393  393    0.0000
2   444  393 2633.9134
3   488  393  866.0254
4   490  393 2291.2878
5   491  393 3031.0889
6   535  393  750.0000
7   537  393 1984.3135
8   540  393 4175.8233
9   583  393 1732.0508
10  584  393 2410.9127

9.5Updating Intra-zonal Distances

distPair %>%
  filter(dist > 0) %>%
  summary()
      Var1           Var2           dist      
 Min.   : 393   Min.   : 393   Min.   :  433  
 1st Qu.:3694   1st Qu.:3694   1st Qu.: 7949  
 Median :5474   Median :5474   Median :12933  
 Mean   :5236   Mean   :5236   Mean   :13705  
 3rd Qu.:6837   3rd Qu.:6837   3rd Qu.:18407  
 Max.   :9988   Max.   :9988   Max.   :44478  
distPair$dist <- ifelse(distPair$dist == 0,
                        200, distPair$dist)
distPair %>%
  summary()
      Var1           Var2           dist      
 Min.   : 393   Min.   : 393   Min.   :  200  
 1st Qu.:3694   1st Qu.:3694   1st Qu.: 7949  
 Median :5474   Median :5474   Median :12933  
 Mean   :5236   Mean   :5236   Mean   :13698  
 3rd Qu.:6837   3rd Qu.:6837   3rd Qu.:18407  
 Max.   :9988   Max.   :9988   Max.   :44478  
distPair <- distPair %>%
  rename(orig = Var1,
         dest = Var2)
write_rds(distPair, "data/rds/distPair.rds") 

10.Combining Passenger Volume Data with Distance Value

weekday_afternoon_od1$ORIGIN_HEX <- as.factor(weekday_afternoon_od1$ORIGIN_HEX)
weekday_afternoon_od1$DESTIN_HEX <- as.factor(weekday_afternoon_od1$DESTIN_HEX)
distPair$orig <- as.factor(distPair$orig)
distPair$dest <- as.factor(distPair$dest)
weekday_afternoon_od2 <- weekday_afternoon_od1 %>%
  left_join (distPair,
             by = c("ORIGIN_HEX" = "orig",
                    "DESTIN_HEX" = "dest"))
hex_grid_df <- as.data.frame(hex_grid_sf) %>%
  select(hex_id, bus_stop_count, business_count, school_count, finserv_count, 
         entertn_count, FB_count, lere_count, retail_count, trainexits_count, residential_count) %>%
  mutate(hex_id = as.character(hex_id))
origin_factors <- hex_grid_df %>%
  select(hex_id, bus_stop_count, business_count, school_count, finserv_count)
weekday_afternoon_od2 <- weekday_afternoon_od2 %>%
  mutate(ORIGIN_HEX = as.character(ORIGIN_HEX),
         DESTIN_HEX = as.character(DESTIN_HEX))
weekday_afternoon_od2_with_origin <- weekday_afternoon_od2 %>%
  left_join(origin_factors, by = c("ORIGIN_HEX" = "hex_id"))
destin_factors <- hex_grid_df %>%
  select(hex_id, entertn_count, FB_count, lere_count, retail_count, trainexits_count, residential_count)
weekday_afternoon_od2_complete <- weekday_afternoon_od2_with_origin %>%
  left_join(destin_factors, by = c("DESTIN_HEX" = "hex_id"))
glimpse(weekday_afternoon_od2_complete)
Rows: 161,671
Columns: 14
Groups: ORIGIN_HEX [1,808]
$ ORIGIN_HEX             <chr> "393", "393", "393", "393", "393", "393", "393"…
$ DESTIN_HEX             <chr> "535", "585", "723", "770", "779", "824", "827"…
$ WEEKDAY_AFTERNOON_PEAK <dbl> 3, 34, 182, 1, 1, 18, 6, 145, 2, 16, 250, 23, 2…
$ dist                   <dbl> 750.000, 3122.499, 1561.249, 1887.459, 7697.402…
$ bus_stop_count         <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ business_count         <int> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0,…
$ school_count           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ finserv_count          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ entertn_count          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ FB_count               <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,…
$ lere_count             <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,…
$ retail_count           <int> 0, 1, 0, 0, 0, 0, 3, 0, 2, 0, 3, 9, 2, 1, 0, 0,…
$ trainexits_count       <int> 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 0,…
$ residential_count      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
write_rds(weekday_afternoon_od2_complete, "data/rds/SIM_data.rds")
rm(business, busstop, busstop_hex, busstop_mpsz, destin_factors, dist, distPair, entertn, FB, finserv, hdb, hex_grid_df, hex_grid_sf, hex_grid_sp, lere, mpsz, origin_factors, residential, retail, schools, trainexits, weekday_afternoon_flowLine, weekday_afternoon_od, weekday_afternoon_od1, weekday_afternoon_od2, weekday_afternoon_od2_complete, weekday_afternoon_od2_with_origin, hex_names)

11.Calibrating Spatial Interaction Models

Importing the modelling data

SIM_data <- read_rds("data/rds/SIM_data.rds")

Visualising the dependent variable

ggplot(data = SIM_data,
       aes(x = WEEKDAY_AFTERNOON_PEAK)) +
  geom_histogram()

ggplot(data = SIM_data,
       aes(x = dist,
           y = WEEKDAY_AFTERNOON_PEAK)) +
  geom_point() +
  geom_smooth(method = lm)

ggplot(data = SIM_data,
       aes(x = log(dist),
           y = log(WEEKDAY_AFTERNOON_PEAK))) +
  geom_point() +
  geom_smooth(method = lm)

Checking for Variables with Zero Values

summary(SIM_data)
  ORIGIN_HEX         DESTIN_HEX        WEEKDAY_AFTERNOON_PEAK      dist      
 Length:161671      Length:161671      Min.   :    1.0        Min.   :  433  
 Class :character   Class :character   1st Qu.:    4.0        1st Qu.: 2411  
 Mode  :character   Mode  :character   Median :   18.0        Median : 4684  
                                       Mean   :  148.4        Mean   : 5658  
                                       3rd Qu.:   73.0        3rd Qu.: 8020  
                                       Max.   :59391.0        Max.   :25159  
 bus_stop_count   business_count    school_count    finserv_count   
 Min.   : 1.000   Min.   : 0.000   Min.   :0.0000   Min.   : 0.000  
 1st Qu.: 2.000   1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.: 0.000  
 Median : 3.000   Median : 0.000   Median :0.0000   Median : 1.000  
 Mean   : 3.401   Mean   : 2.556   Mean   :0.1795   Mean   : 3.944  
 3rd Qu.: 4.000   3rd Qu.: 2.000   3rd Qu.:0.0000   3rd Qu.: 4.000  
 Max.   :13.000   Max.   :60.000   Max.   :4.0000   Max.   :91.000  
 entertn_count       FB_count        lere_count       retail_count   
 Min.   :0.0000   Min.   : 0.000   Min.   : 0.0000   Min.   :  0.00  
 1st Qu.:0.0000   1st Qu.: 0.000   1st Qu.: 0.0000   1st Qu.:  2.00  
 Median :0.0000   Median : 0.000   Median : 0.0000   Median :  7.00  
 Mean   :0.1352   Mean   : 2.135   Mean   : 0.8934   Mean   : 38.84  
 3rd Qu.:0.0000   3rd Qu.: 1.000   3rd Qu.: 1.0000   3rd Qu.: 31.00  
 Max.   :8.0000   Max.   :71.000   Max.   :26.0000   Max.   :986.00  
 trainexits_count residential_count
 Min.   :0.0000   Min.   :   0.0   
 1st Qu.:0.0000   1st Qu.:   0.0   
 Median :0.0000   Median : 219.0   
 Mean   :0.6436   Mean   : 674.6   
 3rd Qu.:0.0000   3rd Qu.:1264.0   
 Max.   :9.0000   Max.   :3531.0   
SIM_data$business_count <- ifelse(
  SIM_data$business_count == 0,
  0.99, SIM_data$business_count)
SIM_data$school_count <- ifelse(
  SIM_data$school_count == 0,
  0.99, SIM_data$school_count)
SIM_data$finserv_count <- ifelse(
  SIM_data$finserv_count == 0,
  0.99, SIM_data$finserv_count)
SIM_data$entertn_count <- ifelse(
  SIM_data$entertn_count == 0,
  0.99, SIM_data$entertn_count)
SIM_data$FB_count <- ifelse(
  SIM_data$FB_count == 0,
  0.99, SIM_data$FB_count)
SIM_data$lere_count <- ifelse(
  SIM_data$lere_count == 0,
  0.99, SIM_data$lere_count)
SIM_data$retail_count <- ifelse(
  SIM_data$retail_count == 0,
  0.99, SIM_data$retail_count)
SIM_data$trainexits_count <- ifelse(
  SIM_data$trainexits_count == 0,
  0.99, SIM_data$trainexits_count)
SIM_data$residential_count <- ifelse(
  SIM_data$residential_count == 0,
  0.99, SIM_data$residential_count)
summary(SIM_data)
  ORIGIN_HEX         DESTIN_HEX        WEEKDAY_AFTERNOON_PEAK      dist      
 Length:161671      Length:161671      Min.   :    1.0        Min.   :  433  
 Class :character   Class :character   1st Qu.:    4.0        1st Qu.: 2411  
 Mode  :character   Mode  :character   Median :   18.0        Median : 4684  
                                       Mean   :  148.4        Mean   : 5658  
                                       3rd Qu.:   73.0        3rd Qu.: 8020  
                                       Max.   :59391.0        Max.   :25159  
 bus_stop_count   business_count    school_count   finserv_count   
 Min.   : 1.000   Min.   : 0.990   Min.   :0.990   Min.   : 0.990  
 1st Qu.: 2.000   1st Qu.: 0.990   1st Qu.:0.990   1st Qu.: 0.990  
 Median : 3.000   Median : 0.990   Median :0.990   Median : 1.000  
 Mean   : 3.401   Mean   : 3.138   Mean   :1.018   Mean   : 4.391  
 3rd Qu.: 4.000   3rd Qu.: 2.000   3rd Qu.:0.990   3rd Qu.: 4.000  
 Max.   :13.000   Max.   :60.000   Max.   :4.000   Max.   :91.000  
 entertn_count      FB_count        lere_count      retail_count   
 Min.   :0.990   Min.   : 0.990   Min.   : 0.990   Min.   :  0.99  
 1st Qu.:0.990   1st Qu.: 0.990   1st Qu.: 0.990   1st Qu.:  2.00  
 Median :0.990   Median : 0.990   Median : 0.990   Median :  7.00  
 Mean   :1.045   Mean   : 2.862   Mean   : 1.539   Mean   : 38.99  
 3rd Qu.:0.990   3rd Qu.: 1.000   3rd Qu.: 1.000   3rd Qu.: 31.00  
 Max.   :8.000   Max.   :71.000   Max.   :26.000   Max.   :986.00  
 trainexits_count residential_count
 Min.   :0.990    Min.   :   0.99  
 1st Qu.:0.990    1st Qu.:   0.99  
 Median :0.990    Median : 219.00  
 Mean   :1.399    Mean   : 675.02  
 3rd Qu.:0.990    3rd Qu.:1264.00  
 Max.   :9.000    Max.   :3531.00  

Unconstrained Spatial Interaction Model

uncSIM <- glm(formula = WEEKDAY_AFTERNOON_PEAK ~ 
                log(bus_stop_count) + 
                log(business_count) +
                log(school_count) +
                log(finserv_count) +
                log(entertn_count) +
                log(FB_count) +
                log(lere_count) +
                log(retail_count) +
                log(trainexits_count) +
                log(residential_count) +
                log(dist),
              family = poisson(link = "log"),
              data = SIM_data,
              na.action = na.exclude)
uncSIM

Call:  glm(formula = WEEKDAY_AFTERNOON_PEAK ~ log(bus_stop_count) + 
    log(business_count) + log(school_count) + log(finserv_count) + 
    log(entertn_count) + log(FB_count) + log(lere_count) + log(retail_count) + 
    log(trainexits_count) + log(residential_count) + log(dist), 
    family = poisson(link = "log"), data = SIM_data, na.action = na.exclude)

Coefficients:
           (Intercept)     log(bus_stop_count)     log(business_count)  
              11.67751                 0.39921                -0.06785  
     log(school_count)      log(finserv_count)      log(entertn_count)  
              -0.30247                 0.42730                 0.05798  
         log(FB_count)         log(lere_count)       log(retail_count)  
              -0.18000                -0.10078                 0.05821  
 log(trainexits_count)  log(residential_count)               log(dist)  
               0.69354                 0.12207                -1.04914  

Degrees of Freedom: 161670 Total (i.e. Null);  161659 Residual
Null Deviance:      96290000 
Residual Deviance: 59350000     AIC: 60130000

R-squared Function

CalcRSquared <- function(observed,estimated){
  r <- cor(observed,estimated)
  R2 <- r^2
  R2
}
CalcRSquared(uncSIM$data$WEEKDAY_AFTERNOON_PEAK, uncSIM$fitted.values)
[1] 0.1063211
r2_mcfadden(uncSIM)
# R2 for Generalized Linear Regression
       R2: 0.381
  adj. R2: 0.381